Top-Down Cohesion Segmentation in Summarization

نویسندگان

  • Doina Tatar
  • Andreea Diana Mihis
  • Gabriela Serban Czibula
چکیده

The paper proposes a new method of linear text segmentation based on lexical cohesion of a text. Namely, first a single chain of disambiguated words in a text is established, then the rips of this single chain are considered as boundaries for the segments of the cohesion text structure (Cohesion TextTiling or CTT). The summaries of arbitrarily length are obtained by extraction using three different methods applied to the obtained segments. The informativeness of the obtained summaries is compared with the informativeness of the pair summaries of the same length obtained using an earlier method of logical segmentation by text entailment (Logical TextTiling or LTT). Some experiments about CTT and LTT methods are carried out for four “classical" texts in summarization literature showing that the quality of the summarization using cohesion segmentation (CTT) is better than the quality using logical segmentation (LTT).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disunity in Cohesion: How Purpose Affects Methods and Results When AnalyzingLexical Cohesion

Lexical Cohesion is a commonly studied linguistic feature as it is easily identified from the surface of a text. However, the purposes for studying lexical cohesion are varied, and each purpose requires different methods. This study analyzes two short movie review texts for four different research purposes using lexical cohesion: text evaluation, text segmentation, text summarization, and text ...

متن کامل

Improving Text Segmentation with Non-systematic Semantic Relation

Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration,...

متن کامل

Implementation of an Automated Text Segmentation System Using Hearst’s Texttiling Algorithm

This paper describes the implementation of a text segmentation system based on Hearst’s TextTiling algorithm. Hearst is a pioneer in the field of text segmentation, and her algorithm has already been shown to provide good results. The algorithm uses lexical frequency and distribution information to recognize the level of cohesion between blocks of text, and then uses these cohesion estimates to...

متن کامل

Generating Reference Texts for Short Answer Scoring Using Graph-based Summarization

Automated scoring of short answers often involves matching a students response against one or more sample reference texts. Each reference text provided contains very specific instances of correct responses and may not cover the variety of possibly correct responses. Finding or hand-creating additional references can be very time consuming and expensive. In order to overcome this problem we prop...

متن کامل

Lexical cohesion, discourse segmentation and document summarization

Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008